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Abstract — In this paper, we consider the problem of dynamic 
programming when supremum terms appear in the objective 
function. Such terms can represent overhead costs associated 
with the underlying state variables. Specifically, this form of op- 
timization problem can be used to represent optimal scheduling 
of batteries such as the Tesla Powerwall for electrical consumers 
subject to demand charges - a charge based on the maximum 
rate of electricity consumption. These demand charges reflect 
the cost to the utility of building and maintaining generating 
capacity. Unfortunately, we show that dynamic programming 
problems with supremum terms do not satisfy the principle of 
optimality. However, we also show that the supremum is a spe- 
cial case of the class of forward separable objective functions. 
To solve the dynamic programming problem, we propose a 
general class of optimization problems with forward separable 
objectives. We then show that for any problem in this class, 
there exists an augmented-state dynamic programming problem 
which satisfies the principie of optimality and the solutions to 
which yield solutions to the original forward separable problem. 
We further generalize this approach to stochastic dynamic 
programming problems and apply the results to the problem of 
optimal battery scheduling with demand charges using a data- 
based stochastic model for electricity usage and solar generation 
by the consumer. 

I. Introduction 

In 2012, 95,000 new distributed solar Photo Voltaic (PV) 
systems were installed nationally, a 36% increase from 2011 
and yielding a total of approximately 300,000 installations 
total [1], Further, utility-scale PV generating capacity has 
increased at an even faster rate, with 2012 installations 
more than doubling that of 2011 [2], Meanwhile, partially 
due to the development of energy-efficient appliances and 
new materials for insulation, US electricity demand has 
plateaued [3], As a consequence of these trends, utility 
companies are faced with the problem that demand peaks 
continue to grow. Specifically, as per the US EIA [4], the 
ratio of peak demand to average demand has increased 
dramatically over the last 20 years. 

Fundamentally, the problem faced by utilities is that con- 
sumers are typically charged based on total electricity con- 
sumption, while utility costs are based both on consumption 
and for building and maintaining the generating capacity 
necessary to meet peak demand. Recently, several public 
and private utilities have moved to address this imbalance 
by charging residential consumers based on the maximum 
rate ($ per kW) of consumption - a cost referred to as a 
demand charge. Specifically, in Arizona, both major utilities 
SRP and APS have mandatory demand charges for residential 
consumers [5], 


For consumers, load is relatively inflexible and hence the 
most direct approach to minimizing the effect of demand 
charges is the use of battery storage devices such as the 
Tesla Powerwall [6], [7], [8], These devices allow consumers 
to shift electricity consumption away from periods of peak 
demand, thereby minimizing the effect of demand charges. In 
this paper, we specifically focus on battery storage coupled 
with HVAC and solar generation. This is due to the fact that 
load from HVAC and electricity from solar generation can 
be forecast well apriori. 

The use of battery storage has been well documented in the 
literature [9] and in particular, there have been several results 
on the optimal use of batteries for residential customers [10], 
[11], [12], [13]. Within this literature, there are relatively 
few results which include demand charges. Of those which 
do treat demand charges, we mention [14] which proposes a 
heuristic form of dynamic programming, and the recent work 
in [15], wherein the optimization problem is broken down 
into several agents, and a Lagrangian approach is used to pre- 
form the optimization. Furthermore, in [16] a similar energy 
storage problem is solved using optimized curtailment and 
load shedding. An L p approximation of the demand charge 
was used in combination with multi-objective optimization 
in [17] and, in addition, the optimal use of building mass for 
energy storage was considered in [18], wherein a bisection on 
the demand charges was used. However, we note that none 
of these approaches resolve the fundamental mathematical 
problem of dynamic programming with a non-separable cost 
function and hence are either inaccurate, computationally 
expensive, or are not guaranteed to converge. Finally, we 
note that there has been no work to date on optimization 
of demand charges coupled with stochastic models of solar 
generation. 

In this paper, we formulate the battery storage problem as 
a dynamic program with an objective function consisting of 
both integrated time-of-use charges and a supremum term 
representing the demand charge. Furthermore, we model 
solar generation as a Gauss-Markov process and minimize 
the expected value of the objective. The fundamental math- 
ematical challenge with dynamic programming problems of 
this form is that, as shown in Section II, problems which 
include supremum terms in the objective do not satisfy the 
principle of optimality and thus reclusive solution of the 
Bellman equation does not yield an optimal policy. 

Dynamic programming for problems which do not satisfy 
the principle of optimality has received little attention and 



there are few results in the literature in which this problem 
has been addressed. The only generalized approach to the 
problem seems to be that taken in [19] which considered 
the use of multi-objective optimization in the case where 
the objective function is “backward separable”. Although 
the supremum term is not backward separable, an L p ap- 
proximation of the supremum is backward separable and this 
approach was applied in [ 17] to the problem of battery stor- 
age. Although not directly addressed in [19], our approach 
is inspired by this result and is based on the observation that 
while the supremum is not backward separable, it is “forward 
separable”. 

To solve forward separable optimization problems, we 
propose in this paper a rigorous approach to a class of gen- 
eralized dynamic programming problems which are formally 
defined in Section II. For this class of problems, we propose 
a precise definition of the principle of optimality and show 
that if this definition holds, then the Bellman equation can be 
used to define an optimal policy. Next, we propose a class 
of forward separable optimization problems and show that 
dynamic programming with integral and supremum terms is 
an element of this class. We then show that the principle 
of optimality fails for certain problems in this class. In 
Section III, we show that for any forward separable dynamic 
programming problem, there exists a separable augmented- 
state dynamic programming problem for which the principle 
of optimality holds and from which solutions to the original 
forward separable problem can be recovered. In Section IV, 
we apply these methods to the battery scheduling problem 
for a given load and solar generation schedule. In Section VI, 
we show that the augmented dynamic programming problem 
can also be used to solve stochastic dynamic programming 
problems with forward separable objectives and apply this 
approach to the battery scheduling problem using a Gauss- 
Markov model of solar generation extracted from data pro- 
vided by local utility SRP. 

II. Background: Generalized Dynamic 
Programming 

In this paper, we consider a generalized class of dynamic 
programming problems. Specifically, we define a generalized 
dynamic programming problem as an indexed sequence of 
optimization problems G(fo,Ao), defined by a an indexed se- 
quence of objective functions J tQrXQ : R mx ( r ~ , o) x ]g>« x (r-f 0 +i) 
where we say that u* € M mx ( r_ 'o) and x* G ]g>«x(r-; 0 +i) so lve 

G(fo,*o) if 

(u*,x*) = argmin/, ^ (u,x) (1) 

U,X u 

subject to: x(t + 1) = f[x(t),u(t),t], given x(fo) =xq 
x(t ) eXcR" for t = to + 1, T 
u(t) £(/C R m for t = to, .., T — 1 

Where / : R” x R m xN-) R", x(t) € R" and u(t) G R m for 
all t. We denote J* )XQ =7, oXo ( u*,x*). 

Definition 1: We say the sequence of controls u = 
(u(to),.~-,u(T — 1)) G R mx ( r_, o) i s feasible if u{t) G 
U for t = to,--,T — 1 and if x(t + 1) = f[x(t),u(t),t\ and 


xfo) = ao, then x(t) G X for all t. For a given a, we denote 
by Y tyX , the set n G G such that f[x,uf] G X. In this paper 
we only consider problems where r, t is nonempty for all a 
and t. 

Note that for this class of optimization problems, 
feasibility is inherited. That is, if u = (u(t),....,u(T — 1)) 
and x = (x(t), ■ ■ ■ ,x(T)) are feasible for G(t,x(t)) 
and v = (v(s), ....,v(T — 1)) and h = (h(s),--- ,h(T)) 
are feasible for G(s,x(s)) where s > t, then 
w = (u(t),- ■ ■ ,u(s — l),v(s),....,v(T— 1)) and 

z = (a(/), • • - ,a(s — ,h(T)) are feasible for 

G(t,x(t)). 

In certain cases, indexed optimization problems of the 
Form of G(to,xo) can be solved using an optimal policy. 

Definition 2: A policy is any map from the present state 
and time to a feasible input (a ,t) K X u{t) G r u , as u(t) = 
n(x,t). We say that n* is an optimal policy for Problem (1) 
if 

u* = (n*(xo,t 0 ),....,n*(x(T -\),T -\)) 

where a(* + 1 )* = f[x(t)*,n*(x(t)*,t),t\ for all t. 

The existence of an optimal policy states that knowledge 
of the current state is sufficient to determine the current input. 
Existence of such a policy vastly simplifies the optimization 
problem. However, not every generalized dynamic program- 
ming problem admits an optimal policy. The “Principle of 
Optimality” defines one class of optimization problems for 
which there exists an optimal policy. 

Definition 3: We say an optimization problem, G(to,xo), 
of the Form (1) satisfies the principle of optimality if the 
following holds. For any s and t with to < t < s < T, if u* = 
(u(t),...,u(T — 1)) and x* = (a (t),...,x(T)) solve G(t,x(t)) 
then v = (u(s), ...,u(T — 1)) and h = (x(s), ...,x(T)) solve 
G(s,x(s)). 

The classical form of Dynamic programming algorithm, 
as originally defined in [20], can be used to solve indexed 
optimization problems of the Form (1). This algorithm has 
the advantage of computational complexity which is linear 
in T. 

Dynamic Programming algorithms are most commonly 
used to solve the special class of indexed optimization 
problems P(t o,ao) of the form 

t - l 

min/ foXo (u,x) = £ c t (x(t),u(t)) +c t (x(T)) (2) 

’ t=t 0 

subject to: x(t+ 1) = f[x(t),u(t),t], given a(?o) = Ao 
a (t) G X for t =to + 1, -,T 
u(t ) G U for t = to , .., T — 1 

Note that Jr, x = Ct( a). We will refer to a(/o) G R" the 
initial state, a (t) G R” the state at time t and u[t) G R m 
the inputs at time t. is the objective function, c t : 

R" x R" 1 — > R for t = to,..,T — 1, cj R n — \ R are given 
functions and / : R" x R m x N — > R" is a given vector field. 
The following lemma shows that this class of problems 
satisfies the principle of optimality. 



Lemma 1: Any Problem of Form P(to,xo) in (2) satisfies 
the principle of optimality. 

Proof: Suppose u* = (n(f), ...,u(T — 1)) and x* = 
(x(t),...,x(T)) solve P(t,x(t)) in (2). Now we suppose by 
contradiction that there exists some s > t such that v = 
(u(s),...,u(T — 1)) and h= (x(s),...,x(T)) do not solve 
P(s,x(s)). We will show that this implies that u* and 
x* do not solve P(t,x ) in (2), thus verifying the condi- 
tions of the Principle of Optimality. If v and h do not 
solve P(s,x(s)), then there exist feasible w, z such that 
^(s)( v > h ) <J sX*)( w,z). i.e. 

^)(w, z) 

= E c t(z(t)Mt))+CT(z(T)) 

t=S 

T - 1 

< E C t (x{t),u(t))+ c T (x(T)) 

t=s 

= /s,x(s) ( v >h) 

Now consider the proposed feasible sequences u = 
(u(t),...,u(s — l),w(s),...,w(T — 1)) and x = (x(t), ...,x(s — 
1 ),z(s),...,z(T — 1)). It follows: 

= E C *( X W-«W)+ E c k(z(k),w(k))+c T {z{T)) 

k—t k—s 

j-l T - 1 

< Y, c Mk)Ak))+ E c k(x(k),u{k)) +c t (x(T)) 

k=t k=s 

which contradicts optimality of u*,x*. Therefore, this class 
of problems satisfies the principle of optimality. ■ 

Proposition 1: Consider the class of optimization prob- 
lems P(to,xo) in (2). If we define F(x,t) = J* x , then the 
following hold for for all x € X . 

F(x,t)= inf {c t (x,u) +F(f(x,u,t),t+ 1)} (3) 

uer,s 

Vt e {fo,..,r- 1 } 

F(x,T)=ct{x ) \/xGX 

Proof: 

Clearly F(x,T) = cy (x) for any x. 

Now for any x € X and t £ {to,--,T — 1}, suppose u* = 
(u(t),..,u(T — 1)) and x* = (x(t), ..,x(T)) solve P(t,x). By 
the principle of optimality v = (u(t + 1), ..,u{T — 1)) and 
h = (x(t + 1), ..,x(T)) solve P(t + \,x(t + 1)). Therefore 

F(x(t + l),f + 1) = J* x ( t +\),t+\ = /q;+l),;+l( v -h)- (4) 

We conclude that 

F{x,t) = J* x1 

= J x ,t{ u*,x*) 

= Ct(x,u(t)) +/v(f+i),r+i(v, h ) 

= Cf(x,u(t)) + F(f(pc,u(t),t),t + 1) using (4) 

> inf {c t (x,u) +F(f(x,u,t),t + 1)} 

«e 


holds for all x and t. 

Now we prove F(x,t ) < inf M€ r , {c t (x,u) +F(f(x,u,t),t + 
1)}. For any u G r x j, let w M = {w{t + 1),- • • ,w(T — 
1)} and h M = {f(x,u,t),z(t +2),- ■ ■ ,z{T)} be feasible for 
P(f(x,u,t),t + 1). Then \ u = {u,w(t + 1),- ■ • ,w(T — 1)} and 
z « = {x,f(x,u,t),z(t + ,z(r)} are feasible for P(t,x). 

Therefore, 

F(x,t) = J* x < J tyX (\ u , h«) 

= C t (x,U ) +Jf( x , u ,t),t+l(' w u,Zu) 

< c t (x,u) +F(f(pc,u,t),t + 1) 

< inf {c t (x,u) +F(f(x,u,t),t + 1)} 

u 

■ 

Note: Equation (3) is often referred to as Bellman’s Equation 
and a function F which satisfies Bellman’s equation is often 
referred to as a “cost to go” function. Prop. 1 shows that 
problems of the Form P(to,x<f) admit a solution to Bellman’s 
Equation which in turn indexes the optimal objective to the 
Problem. Furthermore, for problems P{to,xf), the solution to 
Bellman’s equation can be obtained recursively backwards 
in time using a minimization on u. When x and u are 
discrete, the RHS of Eqn. 3 takes a number of finite values 
and minimization over these values is trivial. When the 
variables are continuous, finding a functional form for the 
minimization step is more challenging. In either case, a 
solution to Bellman’s equation provides a state-feedback law 
or optimal policy as follows. 

Corollary 1: Consider P(to,xo ) in (2). Suppose F(x,t) 
satisfies Equation (3) for P(to,xo), Then 

7T*(xf)=arg inf {c t (x(t),u) +F(f(x(t),u,t),t + 1)}. 

ue r, x 

is an optimal policy. 

Dynamic Programming with Supremum Terms In this 
paper we consider the special class of indexed optimization 
problem, S(foA'o)- In contrast to problems of the form 
P{t oA'o) in (1), class S(to,xo) has supremum (or maximum) 
terms in the objective. Specifically, these problems have the 
following form. 

7-1 

m in4^ 0 (u,x):= V c,{x{t),u(t)) + c T {x{T))+ sup d,(x(k)) 

u ’ x t=t 0 > 0 <k<T 

(5) 

subject to: x(t- 1- 1) = f[x(t),u(t),t\ 
x(0) = .Vo given 
x(t) £ X for t = to, ,.,T 
u(t)£U for t = to, .., T — 1 

Lemma 2: The class of optimization problems in (5) does 
not satisfy the principle of optimality. 

Proof: We give a counterexample. For h > 0, we 
consider the following problem 5(0,0): 

2 

min V c,(u(t))+ sup x(k) 

u€K 3 ,x€K 4 f to 0<k<3 

subject to: x(t + 1) = x(t) + u(t), x(0) =0 

0 < x t < h, 

u(t ) € {— h,0,h} 



TABLE I 

This table shows the corresponding cost of each feasible 

POLICY USED IN THE COUNTER EXAMPLE IN LEMMA 1 


feasible u 

objective value 

feasible u 

objective value 

(0,0,0) 

0 

(/l, 0,-/0 

h/2 

(0,0, h) 

h/2 

(/?, 0, 0) 

0 

(0,/l0) 

2h 

(lu-h.0) 

-h 

(0, h,-h) 

(5/2)h 

(h,-h,h) 

-(3/2)h 


Where here we define co(m(0)) = — m(0), ci(m( 1)) = w(l), 
c 2 (u( 2)) = -u( 2)/2. 

Since u € { — /z, 0, /z} 3 , there are 27 input sequences, only 8 
of which are feasible. In Table I, we calculate the objective 
value of each feasible input sequence and deduce the optimal 
input is u = (h,—h,h). Now suppose we follow this input 
sequence until t = 2 yielding x(2) = 0. Now we examine the 
problem 5(2, x(2)). 

min c 2 (u(2)) + sup x(k) 

iieMxe* 2<&<3 

subject to: x(t + 1) =x(t) + u(t), x(2) =0 

0 < x(t) < hj 
i /(f) G {— /z,0,/z} 

For this sub-problem, there are two feasible inputs: u( 3) G 
{—h, 0}. Of these, the latter is optimal (objective value h/2 
vs 0). Thus we see that although u = {h,—h,h} and x = 
{0,h,0,h} solve 5(0,0), v = {/;} and h = {0,/;} do not solve 
5(2,0). ■ 

III. Solution Methodology: Augmented Dynamic 
Programming 

In this section we will define what a forward separable 
objective function is and later show that the supremum 
is an example of such a function. We will show that for 
dynamic programming problems with a forward separable 
objective function, augmenting the state variables allows us 
to use standard dynamic programming techniques to solve 
the problem. 

Definition 4 ([21]): The function J( u,x) is said to be for- 
ward separable if there exists functions 0q(x, u), </>t(x, 
and <j)j(x,u,<j>i- 1 ) for i= — 1 such that 

/(u,x) = <I>t(x{T),<Pt-i[x(T - l),0r- 2 {--, (6) 

<h{x(2),u(2), 0i {x(1),m(1),<Mx(0),m(0)}}},.. ••.}]) 

where (j), : R" x R p xl ? -> R 9 , for t = I / — 1 and 0y- : 

R" x R 5 -> R, <f> Q : R" x R p -» R 9 . 

Clearly, any objective function of the form 

7-1 

-/( u ,x) = Yj c t{u(t),x(t))+c T (x(T)) 

t=t 0 

is forward separable using tj>o(x, u) =co(x,u), 0r(x,0r - 1 = 
ct(x) + 07-1 and 

<fi(x,u,<fi-\) =Ci(x,u)+<fi - 1 for i =!,••• ,T - 1 


In addition, it can be shown that the sum of any number 
of forward separable functions is forward separable. For 
example, let /i(u,x) and .Cfu. x) be forward separable with 
associated (f \ = gj and 0, = respectively. Then J\ + J 2 is 
forward separable with 


lh . (r „ a. ,u i)l 

[<j)f(x,u,<l)i-i)\ lhj(x,u,tf_i) 


<I>T {x, u, (j>T- 1 ) = gT (x, u, (/>!■_ 1 ) + hr (x, U, <!>%_ i ) . 


Clearly, 


<Po{x,u) = 


_ _ \go(x,u) 


0o(x,m)J [ho(x,u)\ ' 


We now show that the supremum (maximum) function is 
forward separable. 

Lemma 3: 

J( u,x) = max{ sup {ck(u(k),x(k))},CT(x(T))} 
0<k<T-l 

is a forward separable objective function. 

Proof: 

1 ( u , x) = max{ sup {c k (u(k),x(k))},c T (x(T))} 

0<fc<T — 1 

= max{c7’(x(r)),max{c7-_i (u(T — 1 ),jc( 2" — l)),--- 

max{..,max{ci(M(l),x(l)),max{co(M(0),x(0))}}, ..}} 

so that 


0,(x,n,0,-i) =max(c ( (x,M),0;_i), <j>o(x,u) =co(x,m), 

<h(x,<t>T-i) =max(cr(x),0 r ^i) 


A. Forward Separable Dynamic Programming 

We may now define the class of indexed forward separable 
problems H(to,xo) so that H is of class G, but not of class 
P and has the form: 

min/f 0 ,* 0 (u,x) 

subject to: x(t + 1) = f[x(t),u(t),t] (8) 

x(0) = x'o 

x(t) GXcR" for t = 1 ,..,T 
u(t) G U C R p for t = 0, .., T — 1 

where J, ( . A(| is forward separable with associated 0,. For 
every instance of a forward separable dynamic programming 
problem // (to . xq ) , we may associate a new optimization 
problem A (to, xo), which is equivalent to H(1q,xq ) in a certain 
sense and which satisfies the principle of optimality. A(fo,xo) 
is defined as follows. 




minL, 0iXo (u,x) = zi(T + 1) 


subject to: 


Zi(t + 1) 


f(zi{t),u(t)) 

Z 2 (t+ 1)_ 


f>l(z 1 (t),u(t),z 2 (t)) 




(9) 


Zl(l) 

Z2(!). 

= 

/(zi(0),«(0)) 

_0o(zi(O),m(O))_ 


zi(T + l)' 
z 2 (T+1) 

= 

zi(T) 

Mzi(T),z 2 (T))_ 

zi(0) 

Z2(°)_ 

= 

xo 

0 






zi (t)eX for t = l,..,T 


u(t) G U for t — 1 ,..,T 


Where the solution to H(to,xo) can be recovered as 

x(t) =zi(t). 


Lemma 4: Suppose /; 0>X0 (u,x) is forward separable with 
associated Then/,* VQ L* Furthermore, suppose u and 
x solve H(to,xo) and w and z solve A(fo,xo). Then u = w 
and x(t) = z.\ (f) for all t. 

Proof: Suppose w and z solve A(fo,xo). First we show 
that w and z\ are feasible for H(to,xo). Clearly w{t) £ U 
for all t and if we let u = w then x(0) = xo and x(t + 
1) = f[x(t),u(t),t] for all t. Since likewise zi(0) = xo and 
Z\{t + 1) = f[zi(t),u(t),t], we have x(t) = Z\{t) G X for all t. 
Hence u and x = zi are feasible for H{t o,xo). Likewise, if u 
and x solve // (7o,xo). then if we let w = u and z\ = x and de- 
fine Zl(t + 1) = <t> t {zi(t),u(t),Z 2 (t)), Z 2 ( 1) = 0o(zi(O),M(O)), 
Z2(0) = 0, then w and z are feasible. Furthermore, in both 
cases, if we examine the objective value 

/(u,x) = <h-(zi(T),<j)T-i[zi{T - l),w(r- l),0 r _ 2 {...., 

<Mzi ( 2 ), w(2),^ 1 {z 1 (l),w(l),^ 0 {zi(0),w(0)}}},. 


However, we now observe 

z 2 (T + l) = Mzi(T),z 2 (T)) 

z 2 (T) = <fr-i(zi(T - l),z 2 (— 1)) 


Z2( 2) = ^i(zi(1),m(1),z 2 (1)) 

Z2(l) = 0 o(zi(O),m(O)). 

Hence we have 
L(w,z) = zi(T + l) 

01 (zi(l),«( 1 ),Z 2 (l), 0 o(zt(O),«(O))) •••))) 
= /(u,x). 


Hence if w and z solve A (f).xo) with objective L* x = 
Zi{T + 1). then w and z\ solve H(to,xo ) with objective value 

•^oxo = LfOSO = Zl ^ H 

Proposition 2: The augmented optimization problem 
A(to,xo) in (9) satisfies the Principle of Optimality and the 
Bellman equation (3). 


Proof: A(t(),xo) is a special case of P(to,xo) where 
Ci = 0 for i f=T. ■ 

To understand the augmented approach intuitively, we 
note that dynamic programming breaks a multi-period 
planning problem into simpler optimization problems at 
each stage. However, for non-separable problems, to make 
the correct decision at each stage we need historical data. 
In this context, the extra augmented state contains that part 
of the history necessary to make the correct decision at the 
present time. 

Corollary 2: S(to,Xo) is a special case of H(to,xo). 


Proof: Consider the objective function from Problem 
S(to,x o) as 

T - 1 

/, 0iX0 (u,x) = £ c t (x(t),u(t))+c T (x(T))+ sup d t (x(k)). 

t=t 0 t 0 <k<T 

Now this is the sum of two forward separable functions. 
As per the previous discussion (7), then, we define 

gi(x,u,<j>j_ l )=c i (x,u) + <l>i-i for i= l,--- ,T- 1 

go(x,u) =c 0 (x,m), gr(x,0r-i) =ct(x)+4>t-i 

and 


hi(x,u,0l_{) = ma x.{dj(x,u),tyf_{), Iiq(x,u) =do(x,u), 
h T (x, 0r-t) =max(J r (x),0r_ 1 ) 


Then 

gi{x,u,<j>l_ i) 

/i,(x,m,0, 2 _ 1 )_ 

0r {x, u , (j) T - 1 ) = gT {x, U, ! ) + h T (x, u, (j)j_ j ) , 


0i(x,M,0/_ l) = 


0/(x,n,0,-i) 
»,(/>{- 1) 


and 


</>d(x,u) 


go(x,u) 

00 ( X ’ U )_ 


ho(x,u) 


establish forward separability of J t(hXo as per (6). ■ 

The <j>j specified in the proof of this Corollary define an 
instance of problem H(to,xo), which was shown to be 
equivalent to a class of optimization problems A(to,xo) by 
Lemma 4. Since problems of class A (/o-Xo) satisfy the 
principle of optimality, they can be solved using dynamic 
programming and their solution yields a solution to the 
original Problem S(fo,xo). In the following section, we will 
apply this technique to optimal battery scheduling in the 
presence of demand charges. 


IV. Application to the Energy Storage problem 

In this section, we apply the augmented dynamic program- 
ming methodology to optimal scheduling of batteries in the 
presence of demand charges. We first propose a simple model 
for the dynamics of the battery storage. We then formulate 
the objective function using electricity pricing plans which 
include demand charges. We see that the system described 
becomes an optimization problem of the form H(0,eo) (8). 



A. Battery Dynamics 

We will model the energy stored in the battery by the 
difference equation: 

e{k + \) = a(e{k) + i]u(k)Xt) (10) 

Where e{k) denotes the energy stored in the battery at time 
step k, a is the bleed rate of the battery, r / is the efficiency 
of the battery, u(k) denotes the charging/discharging (+/— ) 
at time step k and At is the amount of time passed between 
each time step. Moreover we denote the maximum charge 
and discharge rate by u and u respectively. Thus we have 
the constraint that u(k) £ [u. u] :=U for all k. Similarly we 
also add the constraint e{k) £ [e,e] := X for all k where e 
and e are the capacity constraints of the battery (typically 
e = 0). 

B. The objective function 

Let us denote q(k) to be the power supplied by the grid 
at time step k. 

q{k)=q a (k)-q s (k)+u(k ) (11) 

where q a {k) is the power consumed by HVAC/appliances 
at time step k and qjk) is the power supplied by solar 
photovoltaics at time step k. For now, it is assumed that 
both are known apriori. 

To define the cost of electricity we divide the day 
t £ [0,r] into on-peak and off-peak periods. We define 
an off peak period starting from 12am till t on and t ot j 
till 12am. We define an on-peak period between t on 
till toff- The Time-of-Use (TOU, $ per kWh) electricity 
cost during on-peak and off-peak is denoted by p on and 
p 0 ff respectively. We further simplify this as p k = p on if 
k £ T on and p k = p off if k £ T off where T on and T off 
are the on-peak and off-peak hours, respectively. These 
TOU charges define the first part of the objective function as: 

fon — 1 toff - 1 T 

i F .(u,e) = Pott Y q{k)xt+p 0 n Y q( k ) At +Poif Y q( k ) At 

k—0 k=ton ^- = ^off 

= Y Pk{qa{k)-q s {k) + u{k))/At 

*€ [0,7’] 

= Y Pk{q a {k) - q s (k))At + Y Pku(k)At 
[ 0 ,?’] ite[o,r] 

Where the daily terminal timestep is T = 24/A t. Clearly, 
only the second term in this objective function is significant 
for the purposes of optimization. 

We also include a demand charge, which is a cost 
proportional to the maximum rate of power taken from the 
grid during on-peak times. This cost is determined by p t j 
which is the price in $ per kW. Thus it follows the demand 
charge will be: 


TABLE II 

List of constant values (prices correspond to Salt River 
Project E2 1 price plan) 


Constant 

Value 

Constant 

Value 

a 

0.999791667 (W/h) 

^off 

41 

n 

0.92 (%) 

Pon 

0.0633 x 10~ J ($/KWh) 

U 

4000 (Wh) 

Poff 

0.0423 x 10~ J ($/KWh) 

U 

-4000 (Wh) 

Pd 

3.364 ($/KWh) 

e 

8000 (Wh) 

At 

0.5 (h) 

^on 

27 




J D (u,e)=pd sup q(k) 

ke{t 0B ,....,t 0 f[- 1} 

Pd sup {q a (k)-q s (k)+u(k)} 

C. 24 hr Optimal Residential Battery Storage Problem 

We may now define the problem of optimal battery 
scheduling in the presence of demand and Time-of-Use 
charges, denoted D( 0,eo). 

min{7£(u,e) +7D(u,e)} subject to 

u,e 

e(k+ 1) = a(e(k) + r\u(k)At) for k = 0, ..., T 
e(k) £X for k = 0, ..., T 
u(k) £U for k = 0, ..., T 
eo = e0 

Where recall U := [w, u] and X := \e,e\. 

Proposition 3: Problem D( 0,eo) is a special case of 
S(to,xo) 

Proof: Let q = p, (q a (i) - q s (i) + u(i))At 

Pd(qa(k)-q s (k) + u k ) k £ 

Pon 

0 otherwise. 


We conclude that our algorithmic approach to forward sepa- 
rable dynamic programming can be applied to this problem 
as per Corollary 2. That is, it can be represented as an 
augmented dynamic programming problem of Form A (fo, To). 

V. Numerical Implementation 

To illustrate our approach to generalized dynamic pro- 
gramming, we use solar and usage data obtained by local 
utility Salt River Project in Tempe, AZ. We also use pricing 
data from SRP and battery data obtained for the Tesla 
Powerwall. As is standard practice, for implementation, we 
used a discrete input and state space. The results of the 
simulation are shown in Fig. V. These results show a slight 
improvement in accuracy over results obtained based on the 
approach to a similar problem in [18] (approximately $0.98 
savings). 





Fig. 1. The trajectory the algorithm produces for randomly generated 
stochastic solar data. The supremun of the power is 1.66(kw) and the cost 
is $64.9889. 



Fig. 2. The trajectory the algorithm produces for deterministic solar data. 
The supremun of the power is 0.7033(kw) and the cost is $46,389. 


VI. Using a Stochastic Model 

To show that this approach can also be extended to 
stochastic dynamic programming and to evaluate the effect 
of stochastic uncertainty on battery scheduling, we identified 
a Gauss-Markov model of solar generation based on SRP 
data. We then used a trivial extension of problem A(fo,so) 
to problems with stochastic disturbances. 


A. Solar Generation Model 


Our approach to modeling the dynamics of load following 
for a given subset of data is to model solar irradiance directly 
as a primary variable along with other possible correlated 
variables such as temperature or 2-hr Deltas in pressure. 
Specifically, we take time-series data of these quantities, 
denoted W(f ) and normalize this data as 


Wi (t) = 


°i(t) 


Where fij(t) is the average historic and clear-sky mean of the 
variable W, at time step t and cr,-(f) is the standard deviation 
of variable W, at time step t. 

The generating process is then given by: 

w(f) = Aw(f — l)+Bs(t — 1) for t = 1 ,..,T 
where w (f) G R 3 ,w(0) = 0 

e(f)~N(0,£) , E/j = djj := 


1 i = j 
0 i^j 


Where the matrices A and B are chosen to preserve the lag 
0 and lag 1 cross-correlations seen in the collected data. 
Specifically, we can compute these matrices as ([22]) 

A = M\Mq 1 BB t =M 0 - M { M 0 l M{ 


Where M t is the i-lag cross correlation matrix. So (M,) mjl = 
Pi(m,n) where p,(m,n) is the cross-correlation coefficient 
between variables m and n with variable n lagged by i time 
steps. Then, adding back in the mean and deviation, we 
obtain the power supplied by solar at time step k as 
q s (k)=wi(k)a l (k)+ll 1 [k) 

B. Augmented Stochastic Dynamic Programming 

We now define a class of Stochastic Dynamic Program- 
ming problems, T(to,xo) of the form 


<T - 1 


min/ fo ^o( u . x ) = E ( Yj q(*(0 ,u{t))+c T (x(T)) j (12) 


subject to: x(t + 1) = f[x(t),u(t),P,v(t)], given x(to) = xo 
x(t) £X for t =to+l,.. 1 T 
u(t) G U for t = to, ,.,T — 1 

v(0~N(0,£), Zij = 8ij (13) 


As shown in [20], A stochastic version of the Bellmand 
Equation can be used to solve Stochastic Dynamic Program- 
ming problems of the Form T(to,xo). Specifically, suppose 
that F satisfies F(x,T ) =ct{x) and 

Fix, t ) = inf{c r (x, u) ( 14) 

U 

+ E v [F(f[x,u,P,v\,t + 1) | x,u ]}. 

Then for problem T(to,xo), F{x,t) = J* x and 71* (x) = 
inf„{c f (x,M) + E v [F(f[x,u,t;v\,t + 1)]} defines an optimal 
policy. 

Stochastic Battery Scheduling We now modify Problem 
Z)(0,eo) to give a stochastic version of the battery scheduling 
problem 


minE (|/£(u,e) +7D(u,e)}) subject to 

u,e 

e(k+ 1) = a(e(k) + r\u(k)At) for k = 0 , ..., T 
w(k + 1) =Aw(k) +Be(k) for k = 0,...,T 
e(k) G X for k = 0, ..., T 
u(k) G U for k = 0, ..., T 
e 0 = eO, e(fe)~N(0,E) 

To solve the stochastic version of D( 0,eo), we augment to 
obtain a stochastic version of Problem A(to,xo), which is a 
special case of T(to,xo), which then admits a solution using 
the stochastic version of Bellman’s equation. 


C. Implementation of the Stochastic Algorithm 

The primary challenge with implementation is computing 
the expectation in Bellman’s equation (14). Specifically, if (j) 
is the pdf of v(/), we must compute 

E,,[F(f[x, u ,f,v],t + 1) | x,u ]} = y F(f(x,u,t;v),t+ l)$(v)dv 


To numerically integrate this function, we discretize it, 
x and v so that the integral becomes a sum where 0(v,-) 
is a weighted sample from the normal distribution. The 
results of this algorithm are illustrate in Figure V using the 
parameter values from Table II. The solar data generated 
from this run were then used as input to the deterministic 
algorithm in order to compare performance. As expected, the 
deterministic case performs better than the stochastic case. 

VII. Conclusion 

In this paper we have proposed a generalized formulation 
of the dynamic programming problem and shown that if 
the objective function is forward separable, these problems 
may be solved using an equivalent augmented dynamic 
programming approach. Furthermore, we have shown that 
the problem of optimal scheduling of battery storage in the 
presence of combined demand and time-of-use charges is a 
special case of this class of forward separable dynamic pro- 
gramming problems. We have further extended these results 
to stochastic dynamic programming with a forward separable 
objective. The proposed algorithms were demonstrated on a 
battery scheduling problem using first a deterministic and 
then Gauss-Markov model for solar generation and load. 
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